Biomedical Data Analysis Workshop - October/November 2021

Mark Dunning

Course Introduction and Motivation

(Virtual) welcome to Sheffield

http://www.welcometosheffield.co.uk/visit

Course Structure

How you will learn

What you will not learn

Introduction to R

Motivation

Notable uses

An example from mainstream media

Biological examples

Topics covered

Packages covered

Image Credit: Aberdeen Study Group

Can’t we just do these things in Excel?

Course Data

Example plots

Example plots

R as a reporting tool

Reproducibility Crisis

Reproducibility Crisis

Talk “highlights”

Reproducible Research

Why share data?

How to share data

Things can go wrong

Fire at CRUK Manchester - April 2017

Things can go wrong

Discussion

Data Backup

Credit: CRUK_CI Bioinformatics Core

Data Backup

Credit: CRUK_CI Bioinformatics Core

Version Control

Credit: PhD Comics

Version Control

Credit: CRUK_CI Bioinformatics Core

More advanced Options

Naming of files and directories

Naming of files and directories

Credit: CRUK_CI Bioinformatics Core

Naming of files and directories

Credit: CRUK_CI Bioinformatics Core

Spreadsheet Organisation

Context

Context

Context

Context

No matter how much of the analysis is automated, some manual steps are inevitably involved

Reproducible Research

Should we stop using Excel completely?

Rule 1

Rule 1 - Never work directly on the raw data

Rule 1

Rule 2

Rule 2 - Maintain consistency

Example 1

Patient ID Sex Date of Diagnosis Tumour Size
1 M 01-01-2013 3.1
2 f 04-18-1998 1.5
3 Male 1st of April 2004 105
4 Female NA 67
5 F 2010/03/12 4.2
6 F 3.6
7 M 1994-11-05T08:15:30-05:00 232

Example 1

Regarding dates

credit: @myusuf3

Example 1 - corrected

Patient ID Sex Date of Diagnosis Tumour Size
001 M 2013-01-01 3.1
002 F 1998-04-18 1.5
003 M 2004-04-01 1.05
004 F NA 0.67
005 F 2010-03-12 4.2
006 F NA 3.6
007 M 1994-11-05 2.32

Rule 3

Figure showing locations of visitors to my Prostate Cancer data portal

Rule 3 - Don’t use 0 to mean missing

Rule 4

Patient ID Date Value
1 2015-06-14 213
2 76.5
3 2015-06-18 32
4 120.3
5 109
6 2015-06-20
7 143

Rule 4

Fill in all the cells

Rule 4

Example 2 Corrected

Patient ID Date Value
1 2015-06-14 213
2 2015-06-14 76.5
3 2015-06-18 32
4 2015-06-18 120.3
5 2015-06-18 109
6 2015-06-20 NA
7 2015-06-20 143

Rule 5

Rule 5

Make it rectangle

Rule 5

More

More

Computer doesn’t recognize it!

Heplful Data Validation feature in Excel

Less helpful “features” in Excel